Lab 7 & 8

Classroom distribution

Lab 7

Lab topics:

Confidence Intervals and Hypothesis Testing

  • t distribution
  • Confidence Intervals definition and interpretation
  • CI in hypothesis testing

Review:

  • A confidence interval (CI) is an estimate of a range of values that is likely to include the true population parameter of interest.
  • A CI is \(\text{CI} = \bar{x} \pm Z_{\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}}\)

Review:

  • We use a t-distribution instead of a normal distribution when we do not know the value of \(\sigma\).

A simulation:

Code
N <- 10000 # number of iteration 
n <- 16 # sample size
m <- 10 # mean
s <- sqrt(9) # SD
alpha <- 0.05 # (1-confidence level)

un <- nw <- matrix(NA, nrow =N, ncol =2) # 2 blank matrices 
evaluate <- evaluate.true <- rep(FALSE, N) # 2 blank vectors
in.CI <- function(x){ (x[1]<=m & m<=x[2])}
# Define a function called in.CI. The input x is a 2-element vector, representing an interval. If m is within the interval, in.CI return TRUE, otherwise returns FALSE.

for (i in 1:N){ # loop starts
Sample <- rnorm(n, m, s) # generate normal variates with given parameters
un[i,] <- c(mean(Sample) - (-1)*qt(alpha/2, df = n - 1)*sd(Sample)/sqrt(n), mean(Sample) + (-1)*qt(alpha/2, df = n - 1)*sd(Sample)/sqrt(n) )
# Calculate the i-th confidence interval for estimated SD
nw[i,] <- c(mean(Sample) - (-1)*qnorm(alpha/2)*s/sqrt(n), mean(Sample) + (-1)*qnorm(alpha/2)*s/sqrt(n) )
# Calculate the i-th confidence interval for known SD
evaluate[i] <- in.CI(un[i,])
# m is contained in 1st CI when SD unknown? 
evaluate.true[i] <- in.CI(nw[i,])
# m is contained in 2nd CI when SD known?
}
sum(evaluate == FALSE)/N # count and make a ratio sum(evaluate.true == FALSE)/N # count and make a ratio"
#> [1] 0.0492
Code
#
#
#
#
#
Code
conf_int <- function(n = 100, mean = 0, sd = 1){
  sample <- rnorm(n = n, mean = mean, sd = sd)
  test <- t.test(sample)
  result <- broom::tidy(test) |>
    select(estimate, conf.low, conf.high, p.value)
  return(result)
}

set_intervals <- function(sample  = 100, n = 100, mean = 0, sd = 1){
  
  intervals <- map_dfr(1:sample, ~ conf_int(n = n, mean = mean, sd = sd))
  
  intervals <- intervals |>
    mutate(id = 1:n(),
           result = ifelse(sign(conf.low) == sign(conf.high), "reject", "accept")) |>
    relocate(id)
  
  return(intervals)
}

set.seed(1111)
intervals <- set_intervals(sample = 20,
                           n = 20)

intervals |>
  ggplot(aes(estimate, id, color = result)) +
  geom_point() +
  geom_segment(aes(x = conf.low, y = id, xend = conf.high, yend = id, color = result)) +
  geom_vline(xintercept = 0,
             linetype = "dashed")

Hypothesis testing

  • A statistical hypothesis is a claim about the value of a parameter.

  • In any hypothesis-testing problem, there are two contradictory hypotheses to consider: null-hypothesis (\(H_0\)) and alternative hypothesis (\(H_a\)).

Hypothesis testing

  • If there are two types of tests that we will deal with. If \(\mu\) is the true value and \(\mu_0\) is the postulated value, for which we are testing, then the possible one-sided and two-sided tests are:
  • One-sided: \(H_o: \mu \geq \mu_o,H_a: \mu < \mu_o\) or \(H_o: \mu \leq \mu_o,H_a: \mu > \mu_o\)
  • Two-sided: \(H_o: \mu = \mu_o,H_a: \mu \neq \mu_o\)

Hypothesis testing

Based on that, we create our null model.

Hypothesis testing

Null model | Significance level

Hypothesis testing

Null model | Significance level | P-value

Hypothesis testing

Null model | Significance level | P-value

Hypothesis testing

When we increase the \(n\), the null model distribution becomes narrower.

Lab 8

Lab topics:

ANOVA

  • ANOVA vs pairwise comparison
  • Hypothesis of ANOVA
  • Interpretation

Multiple comparisons

One-Way ANOVA

In this case, our hypothesis involves multiple means.

\[H_0: \mu_1 = \mu_2 = \cdots = \mu_n, \,\,\, n \geq 3.\]

\[H_1: \text{at least one mean is different. }\]

One-Way ANOVA

Assumptions and requirements:

  • ANOVA requires one continuous variable and another one that is categorical.
  • Residuals are normally distributed.
  • Variances of populations are equal (the highest variance isn’t many multiples bigger than the smallest group variance).
  • Responses for a given group are independent and identically distributed.

One-Way ANOVA

If p-value > \(\alpha\):

  • There are no significant differences between the groups.

If p-value < \(\alpha\):

  • There are is a significant difference between at least one pair of the groups.
  • Pairwise t-tests will identify the significant difference(s).

One-Way ANOVA